In the transition from tabular methods to Function Approximation, we confront the reality that for most complex environments, a lookup table is physically impossible. When the number of states $n$ reaches astronomical scales, we must represent the value function as a parameterized mapping: $v_{\pi}(s) \approx \hat{v}(s, \mathbf{w})$.
The Feasibility Inquiry
You might ask: "Is there any reason to think this might be possible?" Can we really represent $10^{170}$ board states with just a few million weights? The answer lies in the regularity of our world. Similar states usually have similar values. By normalizing features into a stable range like [0, 1], we allow the agent to detect patternsβlike "territory control" in Goβthat apply to billions of configurations it has never explicitly seen.
Generalization vs. Discrimination
- Generalization: The superpower of FA. Learning about state A informs the estimate for a similar state B. This is the only way to scale.
- Discrimination: The ability to tell two states apart. Tabular methods have perfect discrimination but zero generalization; FA trades discrimination for the ability to predict the unknown.